1
ECEN 4532: DSP Laboratory
Professor Michael Perkins
Lab 2
Jesse Tao
February 18, 2018
2
Contents
1. Introduction 3
2. Rhythm 3
2.1. Spectral Decomposition 3
2.2. Spectrum histogram 4
2.3. Similarity Matrix 5
2.4. A first estimate of the rhythm 7
2.5. Periodicity Histograms 8
2.6. A better estimate of the rhythm 9
2.7. Rhythmic variations over time 10
3. Tonality and Chroma 12
3.1. The equal-tempered scale 12
3.2. Chroma 12
3.2.1. Step 1: spectral analysis and peak detection 13
3.2.2. Step 2: Assignment of the peak frequencies to nearby semitones 13
3.2.3. Step 3: The Pitch Class Profile: a weighted sum of the semitones 14
3.2.4. Step 4: Normalize the Pitch Class Profile 14
4. Conclusion 16
A. Appendix Figures 17
A.1. Spectrum Histograms 17
A.2. Similarity Matrices 22
A.3. Rhythm Index B(l) 28
A.4. Autocorrelation AC(l) 34
A.5. Autocorrelation AC(l,m) 40
A.6. NPCP 46
B. MATLAB Code 52
B.1. MATLAB code used to compile figures except for AutoCorrelation(l,m) 52
B.2. MATLAB code used to compile figures for AutoCorrelation(l,m) 53
B.3. MATLAB functions 54
B.3.1. melBank.m 55
B.3.2. mfcc.m 56
B.3.3. spectrumHistogram.m 57
B.3.4. simMatrix.m 58
B.3.5. RhythmIndex.m 59
B.3.6. medianS.m 59
B.3.7. autoC.m 59
B.3.8. RhythmVar.m 60
B.3.9. normPCP 61
3
1 Introduction
In this lab, we look at more digital signal processing for content-based music information
retrieval. The two main features we are looking at are rhythm and tonality. We will be using the
same tracks for lab 1 to investigate these features.
These features are more useful for all frequencies of music as opposed to the features
investigated in lab one that only helped for low frequency music.
2 Rhythm
Musical rhythm can be loosely defined as the presence of repetitive patterns in the temporal
structure of music. We will implement several techniques to detect the presence of rhythmic
structures in an audio track.
Given an audio segment of duration T = 24 seconds, the goal is to compute a score that
measures how often spectral patterns are repeated. Formally, we will compute a vector, B, such
that B(lag) quantifies the presence of similar spectral patterns for frames that are lag frames
apart. The lag associated with the largest entry in the array B is a good candidate for the period
in the rhythmic structure.
We explain below two approaches to compute the array B.
2.1 Spectral decomposition
We’ll first compute the mfcc coefficients for a frame as explained in lab 1, thereby obtaining a
vector, and then modify each coefficient of the mfcc vector with a nonlinearity. The purpose of
the non-linearity is to account for the non-linear perception of loudness by the human auditory
system: loudness does not increase linearly with sound intensity (measured, for instance, in
Watts/m
2
). A good approximation to the perceptual response of the human auditory system
within a given critical band of hearing is obtained by taking the logarithm of the energy in that
band. We therefore define an mfcc vector in decibels for each frame as



where the logarithm is taken of each element.
Later you will write a function that returns an mfcc array, mfcc(k,n), comprising the mfcc
vectors corresponding to each frame of an audio track. This array will have dimensions
 
4
where N
f
= f
s
T/N. Here N
f
is the number of frames, N = 512 is the frame size in samples, f
s
is the
audio sampling rate, and K = N/2 + 1 is the number of frequencies. As in lab 1, we choose
nbanks = 40 as the number of mels. The entire analysis described below is performed at the
level of frames.
2.2 Spectrum histogram
We propose to construct a visual representation of a musical track by counting how often a
given note is played at a given amplitude (we consider each index of the mfcc vector to be a
“note”, and measure amplitude in dB).
Assignment
1. Modify your function that computes the mfcc to return the mfcc in decibels.
2. Construct a two-dimensional spectrum histogram (a matrix of counts), with 40
columns one for each mfcc index, and 81 rows one for each amplitude level,
measured in dB. The amplitude range we care about is from -20 dB to 60 dB, so clip
the mel values to that range. In other words, values less than -20 dB map to -20,
values greater than 60 dB map to 60, and values inbetween map to the nearest
integer in the set {−20,19,18,...,60}. Once you have the counts, normalize the
histogram such that the sum along each column (i.e. for each “note”) is one.
3. Display, using the MATLAB function imagesc, the spectrum histogram for the 12 audio
tracks supplied for the first lab.
1. Here is the MATLAB code of the function mfcc returned in decibels is in the Appedix in
section B3.2:
2, 3. In order to visualize the mfcc, a spectrum histogram is created which maps each mfcc
index and the corresponding amplitude levels to a color displaying how often each mfcc index
occured at the respective volume. Below is the histogram for ‘track201-classical’ which aside
from the very low sounds, most of the pitches were at around the same volume. Spectrum
histograms for all other tracks are included at the end of the report in section A.1. The code
used to create the spectrum histograms is in section B3.3:
5
2.3 Similarity matrix
The first stage involves computing a similarity matrix defined as follows




(3)
Here we are considering the mel spectral signature of each frame to be a vector, and using the dot
product to compute S(i,j), the cosine of the angle between the vectors corresponding to frames i and j.
For this exercise we use the non-dB version of the mfcc coefficients (i.e., we don’t first take 10 times the
base 10 log of the mfcc values). We note that mfcc is always positive in this case, and therefore 0 S(i,j)
1.
The similarity S(i,j) is large, S(i,j) 1, if the notes being played in frame i are similar to the notes being
played in frame j. We note that this similarity is independent of the loudness of the music (the cosine
between two vectors does not depend on their magnitude).
6
Assignment
4. Implement the computation of the similarity matrix, and display the similarity matrices
(color-coded) for the 12 audio tracks supplied for the first lab. Use colormap(’jet’) and
imagesc( ). See Fig. 1 for an example of a similarity matrix.
4. The code for the similarity matrix is in section B3.4 and figure of ‘track201-classical’ for the similarity
matrix is shown below. All other figures and comments for the similarity matrix of other tracks are
included at the end of the report in section A.2.
In the figure it can be seen that classical music is mostly green and blue which is an indication
that the music has a lot of different notes being played. As can be heard in the music, classical
music has a widespread amount of notes being played throughout, so this physically makes
sense for this track.
7
2.4 A first estimate of the rhythm
We note that the entries S(n,n + l) in the similarity matrix (3) are always at a distance (time lag) of l
frames. In order to detect the presence of patterns that are repeated every l frames we compute,

   

 


(4)
B(l) is the sum of all the entries on the l
th
upper diagonal. Indeed, B(0) is the sum of all the entries on the
main diagonal. Similarly, B(1) is the sum of the entries on the first upper diagonal, etc. We note that
0 B(l) 1. (5)
The index l 1 associated with the largest value of B(l) corresponds to the presence of a rhythmic
period of l × N/f
s
seconds.
Assignment
5. Implement the computation of the rhythm index B(l) defined in (4). Plot the vector B
as a function of the lag l = 0,N
f
1 for the 12 tracks. Comment on the presence, or
absence, of a strong rhythmic pattern.
Hint: to debug your function, you can use track-396 which should display strong
rhythmic structure.
5. The code for the computation of the rhythm index is in section B3.5 and plot of the rhythm
index of ‘track396’ is shown below. The rhythm index and comments for the rhythm index for
all other tracks is included at the end of the report in section A.3. There is a very periodic
pattern throughout the song. The periodicity of the graph may hint at the underlying
subdivision of the beat. This makes sense for track 396, as it has a very repetitive thumping bass
line throughout the song. The spikes on the left side of the graph are more akin to faster notes
while slower notes are represented further right.
8
2.5 Periodicity Histograms
Many entries in the similarity matrix are small, and are not useful to quantify the presence of
rhythm. Furthermore, it would be interesting to be able to quantify the rhythm as a function of
tempo. We can define a notion of tempo, measured in beats per minute, by taking the inverse
of the rhythmic period, l · N/f
s
. We get



(6)
Note the multiplication by 60 to convert beats-per-second to BPM. Re-arranging this formula
we can find the l (frame lag) corresponding to a given BPM of interest as


(7)
Now we’ll describe an algorithm to compute the probability of finding a large similarity at a
given tempo of interest.
9
Assignment
6. Compute the median of the matrix S.
7. Using a range of 40 to 240 BPM in increments of size 5 BPM, compute the l
corresponding to each BPM. Next, for each of these l values (and hence for each BPM
of interest), compute the number of times that S(i, i+l) is greater
than . Your function should return a vector of size 41, where each element
corresponds to the number of counts seen at one of the BPMs.
8. Using your array of counts, construct and display a normalized histogram, PH, that
shows the relative number of times that the similarity at that tempo was greater than
the median similarity . (This is just a graph of the vector that you computed in the
previous question normalized by the sum of the vector elements).
Display the histograms for the 12 tracks representing the six different musical genres.
6. The code to compute the median of S is in section B3.6.
2.6 A better estimate of the rhythm
Above we looked at the similarity between frames that are a fixed lag, l, apart in time by computing the
vector B(l). However, it is possible to look for patterns within the similarity matrix related to l in other
ways. In particular, we can ask the following question: if the spectral signature at time i is related to the
signature at time j (i.e. S(i,j) is relatively large), is it also related to the spectral signature at time j + l (i.e.
is S(i,j + l) also relatively large?). A lag, l, will be a candidate for a rhythmic period if there are many i and
j combinations such that if S(i,j) is large, S(i,j + l) is also large. This leads us to the autocorrelation.
First, consider a single row, i, in the similarity matrix and the autocorrelation function for that row


  


(8)
For a fixed value of i, this summation will tend to be large at values of l which cause peaks in row i to
line up with peaks in the same row translated by the amount l. In other words, this summation will be
large at values of l corresponding to a periodicity in row i of the S matrix.
We can get a global measure by averaging these values together over all starting times i



(9)
10
Assignment
9. Implement the computation of the rhythm index AC(l) defined in (9). Plot the vector
AC as a function of the lag l = 0,1,...,N
f
1 for the 12 tracks.
Comment on the presence, or absence, of a strong rhythmic pattern.
See Fig. 2 for an example of a plot of the rhythm index.
9. MATLAB code for implementation of AC(l) is in section B3.7
The plot for AC(l) of ‘track396-electronic’ is shown in the figure below. The plot is very similar to
the plot of B(l) and the better decision comes down to which method is faster for this specific
track. Comments and plots for the other tracks can be found at the end of the report in section
A.5.
2.7 Rhythmic variations over time
Here we are interested in studying the dynamic changes in the rhythm. For this purpose, we consider
short time windows formed by 20 frames (approximately 1 second) over which we compute a vector AC
of size 20
11
for l = 0,...,4, and m = 0,...,N
f
/20 1. (11)
Assignment
10. Implement the computation of the rhythm index AC(l,m) defined in (11). Plot the image AC in
false color as a function of the lag l = 0,...,19 (y-axis) and the time window index m (x-axis) for the
12 tracks. Comment on the variation of rhythm patterns for the different tracks.
10. MATLAB code used to implement AC(l, m) is in section B3.8.
The following is the rhythmic variation of ‘track396-electronic’ and it can be seen that there are
strong rhythmic patterns, but they are mostly the same so the chart is mostly blue. All
comments for the other tracks rhythmic variations as well as charts of rhythmic variations can
be found at the end of the report in section A.6.
12
3 Tonality and Chroma
3.1 The equal-tempered scale
We now investigate the time-frequency structure related to the concept of tonality and chroma. The
tonality is concerned with the distribution of notes at a given time.
In contrast, the melody characterizes the variations of the spectrum as a function of time.
We first observe that the auditory sensation that is closely related to frequency, pitch, corresponds to a
logarithmic scale of the physical frequency. We have seen examples of such scales in the previous lab:
the bark and the mel scales.
In this lab, we define another logarithmic scale, known as the tempered scale. First, we define a
reference frequency f
0
associated with a reference pitch. While it is customary to use the frequency 440
Hz associated with the pitch A4, we will use f
0
= 27.5Hz (this known as A0, the lowest note on a piano).
The tempered scale introduces 12 frequency intervalsknown as semi-tonesbetween this reference
frequency f
0
and the next frequency also perceived as an A, 2f
0
. As a result, a frequency f
p
that is p
semitones away from f
0
is given by
fp = f02
p/12
(12)
An interval of 12 semitones, [f
p
,f
p+12
), corresponds to an octave. The same notes (e.g. A, or C#) are
always separated by an octave. For instance, the two previous A notes, A0 and A4, are separated by 4
octaves, since 440 = 2
4
× 27.5. The notion of note can be formally modeled by the concept of chroma.
3.2 Chroma
The chroma is associated with the relative position of a note inside an octave. It is a relative
measurement that is independent of the absolute pitch. We describe an algorithm to compute a chroma
feature vector for a frame n. To simplify the notation, we drop the dependency on the frame index, n, in
this discussion.
We first present the idea informally, and then we make our statement precise. We are interested in
mapping a given frequency, f, to a note. Given a reference frequency f
0
, we can use the following
equation to compute the distance p between f and f
0
, in semitones
p = round(12log
2
(f/f
0
)) (13)
We note that f is usually not exactly equal to f
0
2
p/12
for an integer value of p, which is why we round
12log
2
(f/f
0
) to the closest integer.
We can then map this index p to a note, or “chroma”, c, within an octave, using
c = p mod(12) (14)
or equivalently,
13
p = 12q + c, where 0 c 11, and q {0,1,2,...} (15)
In other words, f
p
and f
p+12
given by (12) correspond to the same note c, which is exactly what you see if
you look at the keys of a piano.
The problem with this approach is that we do not have an algorithm to extract the frequencies f of the
notes that are being played at a given time. Rather, the Fourier Transform of a frame gives a distribution
of the spectral energy at that frame. We need to identify the main peaks in the Fourier transform, and
compute the chroma associated with these frequencies, using the equations above.
In the following we describe in detail the different steps of an appropriate algorithm.
3.2.1 Step 1: spectral analysis and peak detection
The goal is to identify the main notes being played, and to remove the noise and the harmonics coming
from the timbre of each instrument.
For each frame n, we compute the windowed FFT as explained in the first lab. We obtain K = N/2 + 1
(where N is even) Fourier coefficients, given by
Y = FFT(w. x
n
);
K = N/2 + 1;
X
n
= Y (1 : K); (16)
We detect the local maxima of the magnitude of the Fourier transform |X
n
|, and count the number of
maxima, or peaks, n
peaks
. Each peak occurs at a particular coefficient value, k, and we let the set P be the
set of k’s where peaks occur. We denote by f
k
, where k P, the corresponding peak frequencies. The
matlab function findpeaks( )
3.2.2 Step 2: Assignment of the peak frequencies to nearby semitones
Using 27.5 Hz as our value for f
0
, for each peak frequency, f
k
, we find the two semitones nearest to f
k
,
p
L
(k) and p
H
(k), via
p
L
(k) = floor(12log
2
(f
k
/f
0
))
and
(17)
p
H
(k) = ceil(12log
2
(f
k
/f
0
))
We then map the index p
L
(k) to the note c
L
(k) via
(18)
c
L
(k) = p
L
(k) mod(12)
(19)
and do the same for p
H
(k) to get c
H
(k).
14
3.2.3 Step 3: the Pitch Class Profile: a weighted sum of the semitones
Instead of completely assigning each peak frequency, f
k
, to a single semitone, it is beneficial to spread
the effect of f
k
to the two neighboring semitones c
L
(k) and c
H
(k) using a raised cosine function.
We define the Pitch Class Profile associated with the note, c, as a weighted sum of all the peak
frequencies whose ceilings or floors map to c, irrespective of the octave they fall in (recall that c =
0,...,11).
More precisely, each peak k P contributes the following amounts to PCP(c
L
(k)) and PCP(c
H
(k)) (assume
the PCP variables are initialized to zero)
PCP(c
L
(k)) = PCP(c
L
(k)) + w
L
|X
n
(k)|
2
and
PCP(c
H
(k)) = PCP(c
H
(k)) + w
H
|X
n
(k)|
2
where the weights w
L
and w
R
are computed via
r
L
= 12log
2
(f
k
/f
0
) p
L
(k)
w
L
= cos
2
(πr
L
/2)
and
r
H
= p
H
(k) 12log
2
(f
k
/f
0
)
(24)
w
H
= cos
2
(πr
H
/2)
(25)
In plain English, PCP(c) tells us how much energy is associated with frequencies that are near to the note
c across all the octaves.
3.2.4 Step 4: Normalize the Pitch Class Profile
We want our definition of chroma to be independent of the loudness of the music. We therefore need
to normalize each PCP by the total PCP computed over all the semitones. The normalized pitch class
profile (NPCP) is defined by
15
(26)
11. MATLAB code used to implement the NPCP is in section B3.9.
12. This is the NPCP for ‘track201-classical’. The graph shows that A is the most common note.
It is also noted that this track has multiple instruments playing different notes, which makes
sense looking at this graph with the representation of many notes being played simultaneously.
Comments for NPCP of other tracks and graphs of NPCP of other tracks can be found at the end
of the report in section A.7.
Assignment
11. Implement the computation of the Normalized Pitch Class Profile, defined by (26). You
will compute a vector of 12 entries for each frame n.
12. Evaluate and plot the NPCP for the 12 audio tracks supplied for the firstlab:
http://ecee.colorado.edu/~fmeyer/class/ecen4532/audio-files.
zip
See Fig. 4 for an example.
16
4 Conclusion
In this lab, we investigated the spectral decomposition, similarity matrices, rhythm, and chroma using
the tracks for lab 1. The similarity matrices were able to show us at what points in a song where similar
notes were being played and could be used to organize music by the variety of a track of music. The
rhythm of the music was helpful for tracks that had a very periodic pattern in the background. The
chroma was able to define the different notes in a track in 12 different semitones and analyze what the
main note being played is at different times. Problems 7 and 8 were skipped as I ran out of time to
implement the functions specified in those two problems. These features analyzed in this lab seem to be
more useful than the features investigated in lab 1 as these can be applied to high frequency tracks,
rather than just low frequency tracks. All code used for this lab can be found in section B.
17
A. Appendix Figures
A.1 Spectrum Histograms
18
19
20
21
22
A.2 Similarity Matrices
As it can be seen, this classical track is like the other one where most of the matrix is blue and
green which shows many different notes are being played throughout the song.
23
In these 2 electronics tracks, in track 370 it is clear that many different notes are being played
through the song, so the matrix has a lot of blue. However, track 396 has lots of repetitive notes
24
through different frames. Therefore a lot of the matrix is red.
The jazz tracks have similar notes being played throughout the tracks which makes sense as
there only seems to be one instrument throughout. Thus, the frames between i and j are very
similar.
25
In track 463, there are notes being played every other 100 frames that are similar, but in track
492 there are too many notes being played repetitively, so it’s hard to find a pattern for the
song. All values in the matrix are very close to each other for tack 492.
26
At the beginning of track 547 there are a lot of similar notes being played. It is heard from the
middle of the song and can be seen in the figure. For track 550, there isn’t much of a repetitive
pattern, so all the values are around 0.4 0.6.
27
Track 707 is very quiet and does not have many similar notes being played except at frame 200
where there is a flute playing a steady note. Track 729 doesn’t have many similar notes being
played throughout, similar to the classical tracks.
28
A.3 Rhythm Index B(l)
Here it can be seen that track 201 doesn’t have a strong rhythm, while track 204 has a bit more
of a distinct rhythm as it is mostly piano, but not much better.
29
Track 370 doesn’t have a strong rhythm, but it has a few pulses in the middles of the track. This
is very different from track 396 where rhythm was very strong, so rhythm might not be a good
way to classify electronic music.
30
Both jazz tracks have weak rhythmic structure and don’t fluctuate as much compared to the
other tracks.
31
For track 463, there is a drum playing it the background which gives the strong rhythmic
structure. Track 492 sounds like it has a strong rhythmic structure, but it cannot be seen in this
first estimate.
32
Track 547 has an electronic guitar playing in the background, thus it has strong rhythmic
structure. For track 550, it has weak rhythmic structure which is similar to what I was hearing
for this track.
33
Track 707 has weak rhythmic structure as there is only one instrument being played at all times
and raises in tone gradually. Track 729 is very soft and doesn’t seem to have much rhythm as
seen in the graph.
34
A.4 Autocorrelation AC(l)
Compared to the first estimation of these tracks, it can be seen especially in track 204 that
there is more rhythm being detected that is from the piano. But as usual track 201 has lots of
instruments playing at once, so there isn’t a strong rhythm.
35
For these two tracks, the graphs just offer a cleaner structure compared to the first estimate,
but they are essentially the same.
36
For these two tracks the rhythm is being detected better than the first estimate as there is
more fluctuation in track 439, however the rhythmic structure is still very weak.
37
For these two tracks not much changed from the first estimate. Both have weak rhythmic
structure.
38
For these two rock tracks, the presence of the electronic guitar is more clear now, and track 550
still has weak rhythmic structure.
39
For the world tracks, it can be seen that they are about the same as the first estimate. The same
analysis applies for the tracks as the first estimate with these better estimates.
40
A.5 Autocorrelation AC(l,m)
For track 201, it doesn’t have too many various rhythm patterns as it can be seen it is mostly
blue. For track 204, it has a lot of variation in rhythm patterns and the image is a visual
representation of the fluctuations of rhythm in the track.
41
For track 370, you can see from the color change, that there are different rhythm patterns that
transition quickly. Track 396 shows similar behavior, but as the rhythm patterns are strong it is
mostly blue.
42
For track 437 and 439, it is clear there is not much variation in rhythm patterns for these tracks,
looking at the better estimate of the rhythm these tracks just go straight across.
43
The metal tracks have a bit more fluctuation in the colors than the jazz tracks, however not too
much variation in the rhythm patterns.
44
For track 547, it has a strong rhythmic pattern, however it does not vary much so it is mostly
blue. Track 550 has a very weak rhythmic pattern and doesn’t vary much, so it is also mostly
blue.
45
Both world tracks don’t really have a rhythmic pattern of variation of rhythmic pattern so they
are mostly blue.
46
A.6 NPCP
Track 201 has the majority of notes being A, C and F, which makes sense as the piece is in A
minor. Track 204 is simpler to follow as is clear which notes are being played at which frame as
there is only the piano playing.
47
Track 370 is very hard to follow and it is unclear what notes are being played with the chroma
being all over the place at different frames, likely due to the music being produced by a
computer. Track 396 is a bit easier to follow as C, D, and F seem to be the major notes in this
piece.
48
Track 437 is easy to follow with main notes being C and F which makes sense as there is only
instrument playing for most of the song. Track 439 is similar to track 437 as C and F are the
main notes in the piece.
49
Track 463 is difficult to follow as it is all over place in the chroma which makes sense after
listening to it, while track 492 has mainly F.
50
Track 547 starts at F and then moves to C around 400 frames and also a heavy section of C
starting at 800 frames. Track 550 seems to be all over the place with only D being clear around
500 frames.
51
Track 707 has a clear F note throughout the track, while track 729 is all over the place with
quite a few notes being represented in the chroma.
52
B. MATLAB Code
B.1 MATLAB code used to compile figures except for
AutoCorrelation(l,m)
lab2.m
%get figures for all parts of report
clear all;
close all;
clc;
fileArray = cellstr([
'track201-classical.wav '; ...
'track204-classical.wav '; ...
'track370-electronic.wav'; ...
'track396-electronic.wav'; ...
'track437-jazz.wav '; ...
'track439-jazz.wav '; ...
'track463-metal.wav '; ...
'track492-metal.wav '; ...
'track547-rock.wav '; ...
'track550-rock.wav '; ...
'track707-world.wav '; ...
'track729-world.wav '
]);
parfor i=1:12
filename=['' fileArray{i} ''];
fbank=melBank(filename);
[song, fs] = audioread(filename);
ExtractedSong=extractSound(filename);
Xk=freqDist(ExtractedSong);
mfccp=mfcc(fbank,Xk);
specHistogram=spectrumHistogram(filename,mfccp,1);
sim=simMatrix(mfccp,filename,1);
rhythmIndex(filename,sim,1);
medS = medianS(sim);
autoC(filename,sim,1);
%%ARm=rhythmVar(filename,sim,1);
NPCP=normPCP(filename,ExtractedSong,1);
end
freqDist.m
function [ Xk ] = freqDist(song)
%freqDist Computes the frequency distribution
% info=audioinfo(filename);
% song=extractSound(filename);
frames_overlap = buffer(song,512,256);
w=kaiser(512);
N=512;
K=N/2+1;
Y=fft(w.*frames_overlap(:,2));
Xk=Y(1:K);
53
for i = 1:length(frames_overlap)
Y=fft(w.*frames_overlap(:,i));
Xk(1:K,i)=Y(1:K);
end
end
B.2 MATLAB code used to compile figures for AutoCorrelation(l,m)
lab2.m
%get figures for all parts of report
clear all;
close all;
clc;
fileArray = cellstr([
'track201-classical.wav '; ...
'track204-classical.wav '; ...
'track370-electronic.wav'; ...
'track396-electronic.wav'; ...
'track437-jazz.wav '; ...
'track439-jazz.wav '; ...
'track463-metal.wav '; ...
'track492-metal.wav '; ...
'track547-rock.wav '; ...
'track550-rock.wav '; ...
'track707-world.wav '; ...
'track729-world.wav '
]);
parfor i=1:12
filename=['' fileArray{i} ''];
fbank=melBank(filename);
[song, fs] = audioread(filename);
ExtractedSong=extractSound(filename, 1);
Xk=freqDist(ExtractedSong);
mfccp=mfcc(fbank,Xk);
%%specHistogram=spectrumHistogram(filename,mfccp,1);
sim=simMatrix(mfccp,filename,1);
%%rhythmIndex(filename,sim,1);
%%medS = medianS(sim);
%%autoC(filename,sim,1);
ARm=rhythmVar(filename,sim,1);
%%NPCP=normPCP(filename,ExtractedSong,1);
end
freqDist.m
function [ Xk ] = freqDist(song)
%freqDist Computes the frequency distribution
% info=audioinfo(filename);
% song=extractSound(filename);
frames_overlap = buffer(song,512,256);
w=kaiser(512);
N=512;
54
K=N/2+1;
Y=fft(w.*frames_overlap(:,2));
Xk=Y(1:K);
for i = 1:length(frames_overlap)-468
Y=fft(w.*frames_overlap(:,i));
Xk(1:K,i)=Y(1:K);
end
end
B.3 MATLAB functions
B.3.1 melBank.m
function [ fbank ] = melBank(filename)
%melBank Creates a set of mel filter banks
% Implement the computation of the triangular filterbanks
% Hp, p = 1,...,NB. Your function will return an array fbank of size
% NB x K such that fbank(p,:) contains the filter bank Hp.
[~,fs]=audioread(filename);
N=512;
K=N/2+1;
nbanks = 40; %% Number of Mel frequency bands
% linear frequencies
linFrq = 20:fs/2;
% mel frequencies
melFrq = log ( 1 + linFrq/700) *1127.01048;
% equispaced mel indices
melIdx = linspace(1,max(melFrq),nbanks+2);
% From mel index to linear frequency
melIdx2Frq = zeros (1,nbanks+2);
% melIdx2Frq (p) = \Omega_p
for i=1:nbanks+2
[~,indx] = min(abs(melFrq - melIdx(i)));
melIdx2Frq(i) = linFrq(indx);
end
fbank=zeros(nbanks,K);
%mapping frequencies banks
fbank_freq = linspace(0, 5512, 257);
for i = 2:nbanks+1
range = 2/(melIdx2Frq(i+1) - melIdx2Frq(i-1));
for j = 1:K
filt_val = range;
if (fbank_freq(j)<=melIdx2Frq(i))&&(fbank_freq(j)>melIdx2Frq(i-1))
filt_val = filt_val*((fbank_freq(j)-melIdx2Frq(i-...
1))/(melIdx2Frq(i) - melIdx2Frq(i-1)));
55
fbank(i-1,j) = filt_val;
elseif (fbank_freq(j) < melIdx2Frq(i+1)) && (fbank_freq(j) >=...
melIdx2Frq(i))
filt_val = filt_val*((melIdx2Frq(i+1) -...
fbank_freq(j))/(melIdx2Frq(i+1) - melIdx2Frq(i)));
fbank(i-1,j) = filt_val;
else
fbank(i-1,j) = 0;
end
end
end
if(nargin)
% h=figure;
plot(fbank.');
title('Mel Filter Bank');
xlabel('Frequency (Hz)');
ylabel('Filter Magnitude');
xlim([0,length(fbank)]);
saveas(gca,'melFilterBank.png');
% close(h);
end
end
B.3.2 mfcc.m
function [ mfccp ] = mfcc( fbank,Xn,~)
%mfcc Computes the Mel frequency coeffecients
% The mel-spectrum (MFCC) coefficient of the n-th frame is defined for
% p = 1,...,NB
narginchk(2, 3);
% Xn=freqDist(filename);
mfccp=(fbank*abs(Xn)).^2;
% Lets us still use non-log mfcc (if needed)
if(nargin==3)
mfccp=10*log10(mfccp);
return;
end
end
function [ soundExtract,p ] = extractSound( filename, time)
%extractSound Extracts time (in seconds) from the middle of the song
% Write a MATLAB function that extract T seconds of music from a
% given track. You will use the MATLAB function audioread to
% read a track and the function play to listen to the track.
narginchk(1, 2);
if(nargin == 1)
time = 24;
end
56
info = audioinfo(filename);
song=audioread(filename);
if time >= info.Duration
soundExtract=song;
if(nargout == 2)
p=audioplayer(soundExtract,info.SampleRate);
end
return;
elseif time<= 1/info.SampleRate
error('Too small of a time to sample');
end
samples=time*info.SampleRate;
soundExtract=song(floor(info.TotalSamples/2)-floor(samples/2):1: ...
floor(info.TotalSamples/2)+floor(samples/2));
if(nargout == 2)
p=audioplayer(soundExtract,info.SampleRate);
end
end
B.3.3 spectrumHistogram.m
function [ specHistogram ] = spectrumHistogram(filename,mfccp, ~)
%SpectrumHistogram Constructs a 40x50 spectrum histogram
% Construct a two-dimensional spectrum histogram (a matrix of counts),
% with 40 columns - one for each mfcc index, and 50 rows - one for each
% amplitude level, measured in dB, from -20 to 60 dB. You will normalize
% the histogram, such that the sum along each column (for each "note")
% is one.
mfccp=10*log10(mfccp);
[a,~]=size(mfccp);
if(a ~= 40)
errorStruct.message = 'mfccp should have 40 rows';
errorStruct.identifier = 'SpectrumHistogram:WrongSize';
error(errorStruct);
end
bottom=min(min(mfccp));
top=max(max(mfccp));
specHistogram = zeros(50,40);
dbSlots = linspace(bottom,top,51);
for i=1:50 %50
for j=1:a % 40 (numBanks)
specHistogram(i,j)=numel(find(mfccp(j,:)>dbSlots(i)...
& mfccp(j,:)<dbSlots(i+1)));
end
end
% Normalize function between 0 and 1
for i=1:a %40
specHistogram(:,i)=specHistogram(:,i)/sum(specHistogram(:,i));
end
57
% Option to plot and save figure
if(nargin == 3)
h=figure;
imagesc(specHistogram);
ax=gca;
xlabel('MFCC index');
ylabel('Amplitude level (dB)');
title({'Spectrum Histogram:'; filename});
ax.YTick = flipud(linspace(0,50,11));
ax.YTickLabel = fliplr({'-20 dB' '-12 dB' '-4 dB' '4 dB' '12 dB'...
'20 dB' '28 dB' '36 dB' '44 dB' '52 dB' '60 dB'});
c=colorbar;
colormap 'jet';
c.Label.String = 'Percent Occurrences of each MFCC coeff.';
saveas(gca,['specHistogram' filename(6:end-4) '.png']);
close(h);
end
end
B.3.4 simMatrix.m
function [ sim ] = simMatrix(mfccp,filename,~)
%simMatrix displays the similarity matrix for the mel coefficients
% The similarity matrix is defined as follows:
% S(i,j) = sum_{k=1}^{nbanks}{
% (mfcc(k,i)*mfcc(k,j))/(norm(mfcc(:,i)) * norm(mfcc(:,j)))
% fbank=melBank();
% mfccp=mfcc(fbank,filename);
[a,b]=size(mfccp);
% Preallocate matrix
sim=zeros(b,b);
normI=sqrt(sum(abs(mfccp).^2,1));
parfor i=1:b %frame i
for j=1:b %frame j
for k=1:a %fbank k
sim(i,j)=sim(i,j)+...
(mfccp(k,i)*mfccp(k,j)/(normI(j).*normI(i)));
end
end
end
if(nargin == 3)
h=figure;
imagesc(sim);
xlabel('Frame Number');
ylabel('Frame Number');
title({'Similarity Matrix:'; filename});
colorbar;
colormap 'jet';
saveas(gca,['SimMatrix' filename(6:end-4) '.png']);
close(h);
end
end
58
B.3.5 rhythmIndex.m
function [ B ] = rhythmIndex( filename,sim,~)
%rhythmIndex Identifies rhythmic periods
% B(l) is the sum of all the entries on the lth upper diagonal.
% The index l associated with the largest value of B(l) corresponds to
% the presence of a rhythmic period of l*K/f_s seconds
% sim = simMatrix( filename);
[len,~]= size(sim);
B=zeros(1,len);
parfor l=0:len-1
B(l+1)=(1/(len-l))*sum(diag(sim,l));
end
if(nargin==3)
h=figure;
plot(B);
xlim([0,len]);
title({'Rhythm Index B(l):'; filename});
xlabel('lag (frames)');
ylabel('Presence of rhythmic period');
saveas(gca,['RhythmIndex' filename(6:end-4) '.png']);
close(h);
end
end
B.3.6 medianS.m
function[medSim] = medianS(sim)
%this funciton computes the median of the similarity matrix S(i, j)
[i,j]=size(sim);
%reshape and calulate median of similarity matrix
medSim = median(reshape(sim, [1, i*j]));
end
B.3.7 autoC.m
function [ AC ] = autoC( filename,sim,~ )
%autoC Computes the autocorrelation of a song
% In general, if two frames i and j are similar, we can find out
% if they are repeated later in the segment, at time j+l
info = audioinfo(filename);
% sim = simMatrix( filename);
[len,~]=size(sim);
AC=zeros(1,len);
parfor l=0:len-1
for i=1:len
for j=1:len-l
AC(l+1)=AC(l+1)+(sim(i,j)*sim(i,j+l));
59
end
end
AC(l+1)=AC(l+1)*(1/(len*(len-l)));
end
if(nargin==3)
h=figure;
xAxis=linspace(0,len/info.SampleRate,len);
plot(xAxis,AC);
xlim([0,len/info.SampleRate]);
title({'Autocorrelation AC(l):'; filename});
xlabel('Lag (secs)');
ylabel('Autocorrelation');
saveas(gca,['AutoC' filename(6:end-4) '.png']);
close(h);
end
end
B.3.8 rhythmVar.m
function [ AClm ] = rhythmVar( filename,sim,~ )
%Rhythm rhythmVar
% In general, if two frames i and j are similar, we can find out
% if they are repeated later in the segment, at time j+l
% For the sake of this lab, we will be processing 20 frames (~1 second)
[len,~]=size(sim);
prod=zeros(20,20);
for l = 1:20
for m = 1:floor(len/20)-1
for i = 1:20
for j = 1:20-l+1
prod(i,j) = sim(i+m*20,j+m*20)*sim(i+m*20,j+m*20+l-1);
end
end
AClm(m,l) = 1/(20*(20-l))*sum(sum(prod(:,:)));
prod = 0;
end
end
if(nargin==3)
h=figure;
imagesc(AClm);
xlim([0, 20]);
ylim([0.5, 1.5])
title({'Autocorrelation AC(l,m):'; filename});
xlabel('Time (secs)');
60
ylabel('Lag (secs)');
colormap 'jet';
colorbar;
saveas(gca,['RhythmVar' filename(6:end-4) '.png']);
close(h);
end
end
B.3.9 normPCP.m
function [PCP] = normPCP( filename,x,~)
%rhythmIndex Identifies rhythmic periods
% B(l) is the sum of all the entries on the lth upper diagonal.
% The index l associated with the largest value of B(l) corresponds to
% the presence of a rhythmic period of l*K/f_s seconds
info=audioinfo(filename);
% x=extractSound(filename);
[s,~,~]=spectrogram(x,kaiser(512),256,512,info.SampleRate,'yaxis');
s=abs(s);
[a,b]=size(s);
locs=zeros(a,b);
peaks=zeros(a,b);
for i=1:b
len=length(findpeaks(s(:,i)));
[peaks(1:len,i),locs(1:len,i)]=findpeaks(s(:,i));
end
freqVals=(5512/257)*locs;
f0=27.5;
sm=round(12*log2(freqVals/f0));
r=12*log2(freqVals/f0)-sm;
c=mod(sm,12);
[~,B]=size(c); % 257,1032
w=(cos(pi*r/2)).^2;
% Make pretty with matrix
w(isnan(w))=0;
% k is the number of peaks
% c is the chroma (A,A#,B,etc.)
% n is the frame number
% Weighting function, w, needs to be k*n*c large
PCP=zeros(12,B);
% Pick out semitones
% Loop through each value of c (1,2,...,12)
% Find the corresponding rows to extract from w where semitone is
% (1,2,...,12) multiply by Xn at that row at the corresponding row
61
peaks=peaks.^2;
for i=1:B %nFrames
for j=0:11
a=find(c(:,i)==j);
PCP(12-j,i)=sum(w(a,i).*peaks(a,i));
end
end
PCP=bsxfun(@rdivide,PCP,sum(PCP));
PCP = 10*log10(PCP/(max(max(x))));
if(nargin==3)
h=figure;
imagesc(PCP);
ax=gca;
ax.YTickLabel=fliplr({'A ','A#','B ','C ','C#','D ','D#','E ',...
'F ','F#','G ','G#'});
ax.YTick=linspace(1,12,12);
colormap 'jet';
title({'Chroma :'; filename});
xlabel('frames');
ylabel('Note');
colorbar;
saveas(gca,['NPCP' filename(6:end-4) '.png']);
close(h);
end
end